We construct a universally Bayes consistent learning rule that satisfies differential privacy (DP). We first handle the setting of binary classification and then extend our rule to the more general setting of density estimation (with respect to the total variation metric). The existence of a universally consistent DP learner reveals a stark difference with the distribution-free PAC model. Indeed, in the latter DP learning is extremely limited: even one-dimensional linear classifiers are not privately learnable in this stringent model. Our result thus demonstrates that by allowing the learning rate to depend on the target distribution, one can circumvent the above-mentioned impossibility result and in fact, learn \emph{arbitrary} distributions by a single DP algorithm. As an application, we prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal \emph{labeled} sample complexity of $\tilde{O}(d/\varepsilon)$ labeled examples (and with an unlabeled sample complexity that can depend on the target distribution).
translated by 谷歌翻译
The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(\xi^{-1} \log(1/\beta))$ (for generalization error $\xi$ with confidence $1-\beta$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of $\tilde{O}(\log^* |X|)$, which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of $\tilde{O}((\log^* |X|)^{1.5})$ by Kaplan et al. We also provide matching upper and lower bounds of $\tilde{\Theta}(2^{\log^*|X|})$ for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications.
translated by 谷歌翻译
Countsketch和功能哈希(“ Hashhing Track”)是流行的随机降低降低方法,支持$ \ ell_2 $ -heavy hters的恢复(键$ i $ where $ v_i^2> \ epsilon \ epsilon \ | \ boldsymbol {v} \ | _2^2 $)和近似内部产品。当输入为{\ em不自适应}(不依赖于先前的输出)时,应用于尺寸$ o(\ ell/\ epsilon)$的经典估计器对于许多在$ \ \ \ \ \ \ \ \ \ hig的尺寸的草图(\ ell/\ epsilon)$都是准确的。 Ell $。但是,当输入是自适应的时,可以在$ o(\ ell)$ QUERIES具有经典估算器和最知名的稳健估计器后构建对抗输入,并且仅支持$ \ tilde {o}(\ ell^2)$ queries。在这项工作中,我们表明,这种二次依赖性在某种意义上是固有的:我们设计了$ O(\ ell^2)$ QUERIES之后的攻击,该攻击会产生一个对抗性输入向量,其草图是高度偏见的。我们的攻击使用“天然”非自适应输入(仅选择最终的对抗输入),并普遍适用任何正确的估计器,包括攻击者未知的估计器。在此,我们暴露了这种基本方法的固有脆弱性。
translated by 谷歌翻译
训练数据的量是决定学习算法的概括能力的关键因素之一。直观地,人们期望随着训练数据的增加,错误率将降低。也许令人惊讶的是,自然尝试正式化这种直觉引起了有趣且具有挑战性的数学问题。例如,在他们关于模式识别的古典书籍中,Devroye,Gyorfi和Lugosi(1996)询问是否存在{单调}贝叶斯一致的算法。这个问题一直开放25年以上,直到最近Pestov(2021)使用单调贝叶斯一致算法的复杂构造解决了该问题进行二进制分类。我们得出了多类分类的一般结果,表明每个学习算法A都可以转换为具有相似性能的单调。此外,转换是有效的,仅使用黑盒甲骨文访问A。 Loog(2019),Viering and Loog(2021)和Mhammedi(2021)。我们的转换很容易意味着在各种情况下单调学习者:例如,它将Pestov的结果扩展到具有任意数量的标签的分类任务。这与针对二进制分类量身定制的Pestov的工作形成鲜明对比。另外,我们在单调算法的误差上提供统一的边界。这使我们的转换适用于无分销设置。例如,在PAC学习中,这意味着每个可学习的课程都接受单调PAC学习者。这通过Viering,Mey和Loog(2019)解决了问题; Viering and Loog(2021); Mhammedi(2021)。
translated by 谷歌翻译
聚类是数据分析中的一个根本问题。在差别私有聚类中,目标是识别$ k $群集中心,而不披露各个数据点的信息。尽管研究进展显着,但问题抵制了实际解决方案。在这项工作中,我们的目的是提供简单的可实现的差异私有聚类算法,当数据“简单”时,提供实用程序,例如,当簇之间存在显着的分离时。我们提出了一个框架,允许我们将非私有聚类算法应用于简单的实例,并私下结合结果。在高斯混合的某些情况下,我们能够改善样本复杂性界限,并获得$ k $ -means。我们与合成数据的实证评估补充了我们的理论分析。
translated by 谷歌翻译
Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results. We propose a simple and practical tool, $\mathsf{FriendlyCore}$, that takes a set of points ${\cal D}$ from an unrestricted (pseudo) metric space as input. When ${\cal D}$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a "stable" subset ${\cal C} \subseteq {\cal D}$ that includes all points, except possibly few outliers, and is {\em certified} to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods.
translated by 谷歌翻译
经典流算法在(并非总是合理的)假设下运行的,即输入流已预先固定。最近,对于设计可靠的流算法,即使在执行过程中自适应地选择输入流也可以提供可证明的保证,越来越有兴趣。我们提出了一个新的框架,用于强大的流媒体,该框架结合了Hassidim等人最近建议的两个框架的技术。[神经2020]以及伍德拉夫和周[焦点2021]。这些最近建议的框架依赖于非常不同的想法,每个想法都具有自己的优势和劣势。我们将这两个框架组合到一个单一的混合框架中,该框架获得了``两全其美的'',从而解决了Woodruff和Zhou留下的问题。
translated by 谷歌翻译
Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation. Through systematic experimentation, we find that interference (or synergy) are primarily determined by model size, data size, and the proportion of each language pair within the total dataset. We observe that substantial interference occurs mainly when the model is very small with respect to the available training data, and that using standard transformer configurations with less than one billion parameters largely alleviates interference and promotes synergy. Moreover, we show that tuning the sampling temperature to control the proportion of each language pair in the data is key to balancing the amount of interference between low and high resource language pairs effectively, and can lead to superior performance overall.
translated by 谷歌翻译
Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. We prove that -- even in the simplest of settings -- any interpolating learning rule (with arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that -- in the same setting -- successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.
translated by 谷歌翻译
Segmentation of regions of interest (ROIs) for identifying abnormalities is a leading problem in medical imaging. Using Machine Learning (ML) for this problem generally requires manually annotated ground-truth segmentations, demanding extensive time and resources from radiologists. This work presents a novel weakly supervised approach that utilizes binary image-level labels, which are much simpler to acquire, to effectively segment anomalies in medical Magnetic Resonance (MR) images without ground truth annotations. We train a binary classifier using these labels and use it to derive seeds indicating regions likely and unlikely to contain tumors. These seeds are used to train a generative adversarial network (GAN) that converts cancerous images to healthy variants, which are then used in conjunction with the seeds to train a ML model that generates effective segmentations. This method produces segmentations that achieve Dice coefficients of 0.7903, 0.7868, and 0.7712 on the MICCAI Brain Tumor Segmentation (BraTS) 2020 dataset for the training, validation, and test cohorts respectively. We also propose a weakly supervised means of filtering the segmentations, removing a small subset of poorer segmentations to acquire a large subset of high quality segmentations. The proposed filtering further improves the Dice coefficients to up to 0.8374, 0.8232, and 0.8136 for training, validation, and test, respectively.
translated by 谷歌翻译